Goto

Collaborating Authors

 Chongqing Province


Fast estimation of Gaussian mixture components via centering and singular value thresholding

Qing, Huan

arXiv.org Machine Learning

Estimating the number of components is a fundamental challenge in unsupervised learning, particularly when dealing with high-dimensional data with many components or severely imbalanced component sizes. This paper addresses this challenge for classical Gaussian mixture models. The proposed estimator is simple: center the data, compute the singular values of the centered matrix, and count those above a threshold. No iterative fitting, no likelihood calculation, and no prior knowledge of the number of components are required. We prove that, under a mild separation condition on the component centers, the estimator consistently recovers the true number of components. The result holds in high-dimensional settings where the dimension can be much larger than the sample size. It also holds when the number of components grows to the smaller of the dimension and the sample size, even under severe imbalance among component sizes. Computationally, the method is extremely fast: for example, it processes ten million samples in one hundred dimensions within one minute. Extensive experimental studies confirm its accuracy in challenging settings such as high dimensionality, many components, and severe class imbalance.


Sparse $ε$ insensitive zone bounded asymmetric elastic net support vector machines for pattern classification

Du, Haiyan, Yang, Hu

arXiv.org Machine Learning

Existing support vector machines(SVM) models are sensitive to noise and lack sparsity, which limits their performance. To address these issues, we combine the elastic net loss with a robust loss framework to construct a sparse $\varepsilon$-insensitive bounded asymmetric elastic net loss, and integrate it with SVM to build $\varepsilon$ Insensitive Zone Bounded Asymmetric Elastic Net Loss-based SVM($\varepsilon$-BAEN-SVM). $\varepsilon$-BAEN-SVM is both sparse and robust. Sparsity is proven by showing that samples inside the $\varepsilon$-insensitive band are not support vectors. Robustness is theoretically guaranteed because the influence function is bounded. To solve the non-convex optimization problem, we design a half-quadratic algorithm based on clipping dual coordinate descent. It transforms the problem into a series of weighted subproblems, improving computational efficiency via the $\varepsilon$ parameter. Experiments on simulated and real datasets show that $\varepsilon$-BAEN-SVM outperforms traditional and existing robust SVMs. It balances sparsity and robustness well in noisy environments. Statistical tests confirm its superiority. Under the Gaussian kernel, it achieves better accuracy and noise insensitivity, validating its effectiveness and practical value.


Individual-heterogeneous sub-Gaussian Mixture Models

Qing, Huan

arXiv.org Machine Learning

The classical Gaussian mixture model assumes homogeneity within clusters, an assumption that often fails in real-world data where observations naturally exhibit varying scales or intensities. To address this, we introduce the individual-heterogeneous sub-Gaussian mixture model, a flexible framework that assigns each observation its own heterogeneity parameter, thereby explicitly capturing the heterogeneity inherent in practical applications. Built upon this model, we propose an efficient spectral method that provably achieves exact recovery of the true cluster labels under mild separation conditions, even in high-dimensional settings where the number of features far exceeds the number of samples. Numerical experiments on both synthetic and real data demonstrate that our method consistently outperforms existing clustering algorithms, including those designed for classical Gaussian mixture models.

  Country:
  Genre: Research Report (0.40)
  Industry: Health & Medicine (0.46)

Inside China's robotics revolution

The Guardian

An engineer at the AgiBot factory in Shanghai, China, where the 5,000th mass-produced humanoid robot had rolled off the production line. An engineer at the AgiBot factory in Shanghai, China, where the 5,000th mass-produced humanoid robot had rolled off the production line. How close are we to the sci-fi vision of autonomous humanoid robots? C hen Liang, the founder of Guchi Robotics, an automation company headquartered in Shanghai, is a tall, heavy-set man in his mid-40s with square-rimmed glasses. His everyday manner is calm and understated, but when he is in his element - up close with the technology he builds, or in business meetings discussing the imminent replacement of human workers by robots - he wears an exuberant smile that brings to mind an intern on his first day at his dream job. Guchi makes the machines that install wheels, dashboards and windows for many of the top Chinese car brands, including BYD and Nio. He took the name from the Chinese word, "steadfast intelligence", though the fact that it sounded like an Italian luxury brand was not entirely unwelcome. For the better part of two decades, Chen has tried to solve what, to him, is an engineering problem: how to eliminate - or, in his view, liberate - as many workers in car factories as technologically possible. Late last year, I visited him at Guchi headquarters on the western outskirts of Shanghai. Next to the head office are several warehouses where Guchi's engineers tinker with robots to fit the specifications of their customers. Chen, an engineer by training, founded Guchi in 2019 with the aim of tackling the hardest automation task in the car factory: "final assembly", the last leg of production, when all the composite pieces - the dashboard, windows, wheels and seat cushions - come together. At present, his robots can mount wheels, dashboards and windows on to a car without any human intervention, but 80% of the final assembly, he estimates, has yet to be automated. That is what Chen has set his sights on. As in much of the world, AI has become part of everyday life in China . But what most excites Chinese politicians and industrialists are the strides being made in the field of robotics, which, when combined with advances in AI, could revolutionise the world of work.


An Interpretable and Stable Framework for Sparse Principal Component Analysis

Hu, Ying, Yang, Hu

arXiv.org Machine Learning

Sparse principal component analysis (SPCA) addresses the poor interpretability and variable redundancy often encountered by principal component analysis (PCA) in high-dimensional data. However, SPCA typically imposes uniform penalties on variables and does not account for differences in variable importance, which may lead to unstable performance in highly noisy or structurally complex settings. We propose SP-SPCA, a method that introduces a single equilibrium parameter into the regularization framework to adaptively adjust variable penalties. This modification of the L2 penalty provides flexible control over the trade-off between sparsity and explained variance while maintaining computational efficiency. Simulation studies show that the proposed method consistently outperforms standard sparse principal component methods in identifying sparse loading patterns, filtering noise variables, and preserving cumulative variance, especially in high-dimensional and noisy settings. Empirical applications to crime and financial market data further demonstrate its practical utility. In real data analyses, the method selects fewer but more relevant variables, thereby reducing model complexity while maintaining explanatory power. Overall, the proposed approach offers a robust and efficient alternative for sparse modeling in complex high-dimensional data, with clear advantages in stability, feature selection, and interpretability



24f8dd1b8f154f1ee0d7a59e368eccf3-Paper-Conference.pdf

Neural Information Processing Systems

Recentstudieshaveshownthat the transferability of adversarial examples exists for CNNs, and the same holds true for ViTs.


VastTrack: Vast Category Visual Object Tracking

Neural Information Processing Systems

V astTrack consists of a few attractive properties: (1) V ast Object Category . In particular, it covers targets from 2,115 categories, significantly surpassing object classes of existing popular benchmarks ( e.g ., GOT -10k with 563 classes and LaSOT with 70 categories). Through providing such vast object classes, we expect to learn more general object tracking.